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Abstract 

Image processing is an increasingly important aspect 
for analysis of data from X and 7-ray astrophysics 
missions. In this paper, I review a method proposed 
by Kebede (L. W. Kebede 1994, ApJ, 423, 878), and 
point out an error in the derivation of this method. 
It turns out that this error is not debilitating - the 
method still "works" in some sense - but as published 
is rather sub-optimal, as demonstrated both on the- 
oretical grounds and via a set of examples. 



1 Introduction 

Image processing techniques are increasingly being 
employed to aid in the interpretation of data from 
X and 7-ray astrophysics missions, both to suppress 
noise from low photon statistics and to invert instru- 
mental responses when required. An excellent exam- 
ple of t his is for Compton tele scopes, such as COMP- 
TEL ([ Schbnfclder et a/.1993 |), where directional in- 
formation of detected photons has a complex relation- 
ship to the measured quantities, source count rates 
are relatively low, and background is high. 

In Kebede (1994) (hereafter referred to as K94), 
a method is presented for unfolding data from the 
instrumental response. Further examination of K94 
reveals certain mathematical and conceptual errors. 
I will review and correct these in this paper, show 
specific examples of the application of the proposed 



method and compare with other similar simple al- 
gorithms. Some claims of Kebede (1994) are dis- 
cussed, and those which are incorrect are noted. 
Interestingly, the claim that "The method totally 
eliminates the possibility of any error amplification" 
([Kebede 1994 1 ) , while misleading, turns out to be 
true. However, we also show that while the method 
of K94 may not amplify noise in the data, it also does 
nothing to suppress noise, rather passing it through 
to the estimate. Whether or not this is considered 
a useful property may be a function of one's partic- 
ular application, though it seems for most scientific 
studies that some suppression of the noise (easily ac- 
complished) is desirable. 

To facilitate direct comparison with K94, we 
shall adopt its somewhat non-standard notation and 
nomenclature. The transpose of a matrix R (termed 
the "converse" by Kebede) is denoted with a tilde, 
i.e., R = R T . The matrix factors from the singular 
value decomposition of R — USV T are instead given 
as AAB. 



2 Mathematical Formalism 

The problem under consideration is essentially that 
of inverting a discretized version of a linear integral 
equation, given an instrument response kernel and 
data perturbed by noise. This may be formulated as 
a matrix equation 

Rs~d (1) 
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where R is an M x N matrix describing the discretized 
instrument response, d is an M-vector representing 
the binned data, and s is an N- vector respresenting 
the the source field flux distribution, usually speci- 
fied as a set of pixels (note that K94 uses S and D; 
we have changed to lower case to avoid confusion be- 
tween matrices and vectors) . The approximate equal- 
ity is a result of noise. If M < N, then equality can 
potentially be achieved, but this is not usually a desir- 
able goal, for reasons which are generally well-known 
([ Groetsch 1984 1 ) . In particular, this leads to a se- 
rious overfit of the data, accounting for every little 
feature whether simply noise or not. 

The approach in K94 is to find constants A, and 
vectors a 1 and b l such that 



Kb 1 = \a l 
Ra l = A;6\ 



(2) 



Though K94 never refers to it as such, eqs. || define 
the singular value decomposition (SVD) of the matrix 
R, with Xi being the singular values, a 1 the left singu- 
lar vectors, and bi the right singular vectors. The a 1 
form a complete orthonormal basis, as do the b l . K94 
uses the rather confusing term "eigenvalues" for the 
Xi, as opposed to the more standard "singular val- 
ues" . The Xi are the square roots of the eigenvalues 
of RR, but these are not the eigenvalues of R, which 
are undefined for M ^ N. 

K94 suggests least-squares solution of eqn. [|, given 
schematically by 

s = R- X d. (3) 

Note that this makes no reference to the data statis- 
tics. Data bins with low expected counts will 
receive equal weight as those with high expected 
counts. We address this point later. For non- 
square R the standard matrix inverse is not de- 
fined; however one can use the generalized left in- 
verse ( Campbell & Meyer 1979 1 ) . Equation |^ im- 
plies R = AAB, with A and B orthonormal ma- 
trices whose columns are the a 1 and b l respec- 
tively, and A = diag(A^). As is well-known 
([ Campbell fc Meyer 1979] ), the generalized left in- 
verse of R is given by 



Substituting this in eqn. ||| yields the least-squares so- 
lution for s, i.e., the s which minimizes \\Rs — d\\ 2 . 
If R is singular, the least-squares solution is not 
unique, and A contains zero elements on the di- 
agonal. The SVD inverse is computed by setting 
A -1 = diag(l/Aj) for X, ± 0, and otherwise. The 
SVD inverse then chooses the solution which mini- 
mizes the Euclidean norm of s (given by ||s|| 2 ). 

Linear inverse theory tells us that the solu- 
tion of eqn. |l| is almost inevitably unstable for 
the types of systems we encounter experimentally 
(| Groetsch 1984[ ). This instability is related to the 
fact that our instrument has finite resolving power, 
and nearby pixels may have a high degree of potential 
confusion. The singular value spectrum reflects this, 
generally decaying rapidly, and such matrices are 
termed ill-conditioned (|Golub & Van Loan 1989 1 ) . 
From eqn. |] we see that a small singular values makes 
a large contribution to the generalized inverse, and 
this tends to lead to noise amplification. To re- 
duce these effects, one must regularize the inverse, 
which for our purposes means suppressing the ef- 
fects of small singular values. This is the claim for 
the method of K94 which, as it turns out, actually 
achieves this in an odd fashion. 

3 The Method 

The derivation of K94 is limited to the case M > N 
and non-singular R, for which the solution of eqn. [l| 
is 

s = (RR^Rd. (5) 

K94 then notes that s and Rd can be related via 
a diagonal matrix (s and Rd are both N- vectors). 
Note that this is not the matrix (RR)' 1 , which is 
very definitely not diagonal for the cases of interest. 
To be more specific, one can relate s and Rd via a 
diagonal matrix, but this matrix necessarily depends 
on R and d. If we denote this matrix as T, then a 
little algebra shows that for s given by eqn. ||, 



Ti 



kk = — = = = ■ (01 

(Rd) k (Rd) k 

Thus, finding T and finding s are equivalent opera- 
(4) tions when estimating the unregularized inverse. K94 



{{RR)- 1 Rd) k 



Rl 1 = BA~ 1 A. 
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asserts that T is non-negative, but this will not be 
true in general, since the s given by eqn. |B| is not non- 
negative. In fact, the noise amplification due to small 
singular values guarantees large positive and negative 
oscillations, which is exactly the problem one is trying 
to mitigate. As we shall see below, eqn. || suggests an 
iteration for estimating T (or s) which under certain 
conditions does force T (s) to be non-negative, but in 
this case s is no longer the least-squares solution to 
eqn. 0. 

K94 goes on the suggest the following iteration on 
s (K94, eqn. (11)): 



,(«+!) = T {n+l) Rd. 



(7) 



The next question is how to compute T^ n+1 ^ from 
R,d, and s (n) . Equation (12) of K94 gives the answer 
as 



n("+l) 
-kk 



» 



{Rdt ] ' 



(8) 



K94 does not elucidate on the meaning of {Rd)^\ 
in particular what the superscript means, since Rd is 
just given by R and d and has nothing to do with the 
iteration. Referring to eqn. |(| we might guess that 
this is supposed to mean what Rd would be if 
were the "true" image, in which case the iteration 
would be 



r ("+l) 
l kk 



» 



(9) 



{RRs^) k 

The next step in the derivation apparently contains 
an algebraic error. K94 (eqn. 13) gives the following 
expression: 

RRB = BA, (10) 

which is simply incorrect. Referring to the SVD of 
.R, and remembering that A and B are orthonormal 
matrices (i.e., AA is the identity), we actually find 



RRB = BAAAABB = BA 2 



(11) 



This difference then explains the form of eqn. (14) of 
K94, which gives the iteration as 



>+i) 



» 



(Rd) k 



(n)' 



(12) 



Note that the denominator of eqn. n3 is just 



(BABs 



If one referred to eqn. hOL and incor- 



rectly substituted RR = BAB into eqn. ^, eqn. [T! 
would result. 



4 Discussion 

Following the derivation of eqn. |l2| (K94, eqn. (14)), 
the author makes the following statement: "Notice 
how the solution given in equation (14) virtually 
ignores the small eigenvalues" ; remember that by 
"eigenvalues" Kebede is actually referring to the sin- 
gular values of R. Let us examine this statement 
further, especially in light of the errors leading to 
eqn. [l2|. 

We begin by considering the "correct" version of 
eqn. O, given by 



» 



(Rd)t 



{RRs( n )) k 



(13) 



where we've simplified the notation a bit and haven't 
bothered expanding RR in terms of its SVD factors, 
since there's no compelling reason to do so. In fact, 
one encounters many cases where R is too large to 
explicitly construct, but where matrix- vector prod- 
ucts can be calculated via fast implicit algorithms. A 
simple example of this would be an imaging system 
with a translationally-invariant point-spread function 
(PSF), for which the products Rs and Rd can be cal- 
culated via the FFT and convolution theorem. For a 
large number of image/data pixels, explicit calcula- 
tion and use of the SVD factors would involve massive 
computational overhead. 

Since we are not advocating use of eqn. O, we 
won't discuss the convergence properties, except 
to note that in numerical experiments it converges 
monotonically in the squared residual \ \Rs — d\\ 2 . Of 
more interest is what it converges to. Not surpris- 
ingly, the stable fixed point of the iteration is just 
the solution given by eqn. ||; proof of this is simple 
and follows directly from eqn. [l3|. The iteration also 
has saddle points, the trivial example being s = 0, 
which does lead to an interesting point. For R, d > 
(as is often the case for imaging systems), if the ini- 
tial value s(°> is positive then eqn. n~3] converges to a 
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saddle point for which s is non-negative. This s is not 
the solution to eqn. ||, but that's a good thing, since 
the non-negativity constraint is not only physical, but 
also serves to stabilize the solution for ill-conditioned 



R ([Dixon et 0/1996]) 



Let us now consider eqn. |12| at face value, and sec 
what it implies for the estimation of s. From eqn. |l2| 
the convergence condition s^ n+1 ' = gives the so- 
lution as 



,(F) 



,(F) 



(Rd) k 



(14) 



where the superscript (F) denotes the value at con- 
vergence, i.e., the fixed point. Solving for s^- F \ again 
using the SVD factorization of R and the orthonor- 
mality of B, we find the stable fixed point to be 



BA~ 1 BRd 
BK~ l BBKAd - 



BAd. 



(15) 



Interestingly, we see from eqn. [T| that not only does 
the iteration of eqn. [l^ "virtually (ignore) the small 
(singular values)" , but in fact ignores them com- 
pletely! So the claim of K94 that this "method totally 
eliminates . . . error amplification" is true, strictly 
speaking, since it is A~ x which leads to this phe- 
nomenon. On first glance a reader might take this 
statement to mean that noise itself is eliminated in 
the estimate, which is not the case. The orthonor- 
mality of A and B imply that noise is propagated 
through to the estimate. If the noise were white, this 
orthonormality implies that the noise in the estimate 
is also white and of the same magnitude as in the 
data. The situation is less clear for Poisson noise, 
but generally one might expect the image pixel noise 
to be similar on average to that in the data pixels; 
examples given below will illustrate this. Note that 
the non-negativity implied by eqn. [l^ implies some 
noise suppression, but only for images where the av- 
erage intensity level is somewhat larger than the noise 
level. 

Equation fl5| represents a regularized estimate of s, 
and might be considered a variant on the damped 



SVD (DSVD) method ([Ekstrom k Rhodes 1974]) 



To obtain the DSVD estimate, one purposely sup- 
presses the effect of small singular values with a 



damping function. Here, our damping function would 
be simply the singular values themselves, so they 
cancel out the inverse. Naively this may sound 
like a good idea, but it actually ignores a lot of 
the information provided by the singular values 
([Groetsch 1984 1). If nothing else, the regulariza- 



tion provided by use of eqn. 15 is always the same 
- we have no control over it. Yet in actual situ- 
ations, we need to control the amount of regular- 
ization, since we'd apply more or less depending 
on the signal-to-noise of the data. DSVD meth- 
ods in general allow such control via the specifica- 
tion of a cutoff value, where the damping function 
drops to zero (or very small values). We might 
conceive of controlling the amount of regulariza- 
tion by stopping the iteration early. This is of- 
ten employed in iterative schemes such as conju- 



gate gradient ([van der Sluis & van der Vorst 1990 1 ) , 
LSQR ( fPaige fc Saunders 1982j[ )7 expectation max- 
imization ([Knodlseder 1997|), and Maximum En- 
tropy ( |Knodlseder 1997 ), where it is found either 
empirically or rigorously that the early iterations 
tend to pick out the statistically interesting struc- 
ture, while the later ones tend to just amplify the 
noise. For the iterations described herein and in K94 
we don't pursue this mathematically, but show ex- 
amples of below. 



5 A brief note on statistics 

As we stated above, the formulation of K94 makes 
no reference to the data statistics, nor the statistical 
interpretation of the converged solution. The formu- 
lation is easily modified, though, so that eqn. [li] is 
directly related to Maximum Likelihood estimation 
of the pixel fluxes. Consider modification of eqn. [l| 
to the following: 



Q~ l ' 2 Rs ~ Q~ 1/2 d, 



(16) 



where Q is the (symmetric positive-definite) covari- 
ance matrix of the noise in d; consider this to be a 
constant for the moment, with the noise Gaussian 
distributed. Least-squares solution of eqn. corre- 
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sponds to ^-minimization, i.e., 

min \\Q- 1/2 Rs-Q- 1/2 d\\ 2 = jjmi(Rs-d) T Q~ 1 (Rs-d), 

5 S 

(17) 

where we've reverted to the superscript T notation 
to denote the vector transpose. For independent ob- 
servations in d, Q is diagonal and eqn. [n] reduces to 
the more familiar form of the \ 2 ■ Minimization of 
eqn. [Tt] implies the solution given by the condition 

RQ- l Rs = RQ~ l d. 



The iteration of eqn. O then becomes 



>+i) _ 



s^\RQ-H) k 
{RQ-iRsW) k 



(18) 



(19) 




Use of the iteration of K94 would require the calcu- 
lation of the singular factors for the matrix Q~ X / 2 R, 
but it's not at all clear what the answer means sta- 
tistically, since A carries much of the statistical in- 
formation in the SVD estimate (i.e., if we consider 
the estimate as an expansion in b l , then the X 2 are 
the statistical variances of the corresponding coeffi- 
cients). Since A is completely cancelled for K94's 
approach, this statistical information is lost. 

Let us now consider the case of Poisson noise, en- 
countered for photon counting experiments. It can be 
shown that maximizing the Poisson likelihood func- 
tion over the pixel fluxes s yields also implies a condi- 6 Examples 
tion of the form eqn. flty (| Wheaton et al.1995}), ex- 



Figure 1: Response for a single Compton scatter an- 
gle for a unit source located at pixel (16,16). This 
response is normalized to unit integral. Units are 
counts. 



as Richardson-Lucy ([Richardson 1972, Lucy 1974 1) 
from a different derivation. This iteration does con- 
verge to the Poisson Maximum Likelihood solution in 
terms of the pixel fluxes s. 



cept that Q now has a dependence on the solution 
given by 

Q = diag(fls). (20) 

Substituting the nth iterate s*- 71 -* and substituting into 
eqn. PHI we find 



>+i) _ 



tu RsM \ 



(21) 



where the division of vectors above is taken to be 
element-by-element. This is simply the Expectation 
Maximization Maximum Likelihood (EMML) algo- 
rithm ([ Hebert, Leahy, fc Singh 1990| ), also known 



To illustrate some aspects of the discussion above, we 
will show some results from an idealized scenario, as 
well as a more realistic simulation. As in K94, we 
employ a typical Compton telescope response for a 
single Compton scatter angle. The idealized PSF is 
computed over a 32 x 32 pixel grid, and shown in Fig- 
ure [l]. For simplicity, we don't worry about finite size 
or edge effects, and simply assume the PSF is nor- 
malized to unity, and wrap it around the boundaries 
as appropriate. The response R is computed from all 
possible translates of this PSF. 

Our first example is a unit source located at 
(16, 16), shown in Figure [| The "dataset" is simply 
the corresponding PSF, with no background or noise 
added. Solution via eqn. Sgives, not surprisingly, the 
exact answer, i.e. Figure]3. The regularized solution 
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10 15 20 25 30 



Figure 2: Test case 1, unit source at pixel (16,16). 
The corresponding "data" is noise-free, and simply 
given by the PSF in Figure [|. 



10 15 20 25 30 



Figure 4: Estimate for test case 1, using 100 steps of 
the iteration of eqn. O. 



■ 
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n 
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Figure 3: Estimated inverse for test case 1 via eqn. na. r ^ , f ± ± • inn + f 

b M 1 — 1 ligure 5: Estimate lor test case 1, using 100 steps oi 

Note that even though the data is noise-free, we have , , ., , . f r> T u Hoi 

. , , . ' the iteration of Paper 1, given by eqn. R2 . 
substantially underresolved the source. 
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5 10 15 20 25 30 

Figure 6: Test case 2, with multiple point sources of 
various strengths. 

of eqn. [l5| is shown in Figure 0, while the answer after 
100 iterations of eqs. [l3| and [l2| are shown in Figures [| 
and H respectively. Note that Figure || is quite spread 
compared to Figure 0, despite the fact that there is 
no noise or background in the "data" . This is an ad- 
mittedly extreme example, but a properly regularized 
technique would allow one to take the statistics into 
account by varying the degree of regularization. The 
cancellation of the singular values in eqn. [l5] implies 
that the bi corresponding to small singular values get 
larger weight in the solution. Since these bi typi- 
cally correspond to large-scale or smooth functions, 
oversmoothing is not a surprising effect. Comparison 
of Figures [| and || also demonstrate this limitation, 
since the results from eqn. ^ are inherently limited 
to be no better than the solution of eqn. [l^ in terms 
of resolving power. 

The second example uses several point sources of 
varying intensity, shown in Figure ^|. We generate 
data by convolution with the PSF and add a con- 
stant background such that the integrated source-to- 
background ratio is 10% (which would be extremely 
good for existing Compton telescopes, but useful for 
purposes of demonstration). Poisson random num- 
bers are then generated for these expected count lev- 
els, resulting in the simulated data shown in Figure [71 




5 10 15 20 



Figure 7: Data for test case 2, with a uniform 
background added to give an integrates source-to- 
background ratio of 10%, and Poisson noise. The 
overplotted contours show the noise-free data. Units 
are counts. 
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Figure 8: Direct least-squares estimate for test case 2. 
The large positive/negative fluctuations are the result 
of noise amplification from small singular values. 
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Figure 9: Estimate for test case 2 from eqn. [Dj. Note 
that the solution in much more stable compared with 

Figure ||, though noise roughly of the same magni- Figure 11: Estimate for test case 2 from the 100th 
tude as in the data is present, as expected from the iteration of eqn. p"3[ 
orthonormality of A and B. 
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Figure 10: Estimate for test case 2 after 100 steps of 
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the iteration of Paper I (eqn. 12). Note some regu- 
larization compared to Figure]^ due solely to stop- 
ping the iteration before convergence. Examination Figure 12: Estimate for test case 2 after 100 steps of 
of Figure || indicates that non-negativity does not EMML (eqn. 21). 
play a role, since due to the background level the 
stable fixed point solution is everywhere positive. 
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Figure 13: Estimate for test case 2 after 100 steps of 
yet another iteration, where we replace the A term in 
the denominator of eqn. [L2 with A 3 . 



For these simple examples, we make no attempt to 
fit or otherwise subtract the background, so the es- 
timates will include a uniform background level as 
well. The least-squares solution is shown in Figure ^, 
and exhibits precisely the large oscillations we wish 
to suppress with regularization. The regularized di- 
rect solution of eqn. [l^ is given in Figure ^| as we 
expect, the large oscillations are damped, since the 
small singular values have no effect, but plenty of 
noise is still evident. If we compare to Figure |Io| , 
computed via 100 iterations of of eqn. [j~2], we see bet- 
ter noise suppression. However, the result of 100 it- 
erations of eqn. [l3] in Figure [ll] is certainly qualita- 
tively better in terms of noise suppression and source 
resolution. The result from 100 steps of the EMML 
iteration of eqn. ^l] is shown in Figure [l^, and ap- 
pears comparable with Figure [ll]. Just for fun, we 
also computed a result where we replaced the A term 
in eqn. [l2] with A 3 (there is a A 2 term implicit in 
eqn. 031). Shown in Figure [l| this map seems qual- 
itatively better yet. Quantitatively, of course, only 
eqn. [l^ will give correct photometry, since it corre- 
sponds to the case where we use the proper general- 
ized inverse. 



7 Final comments 

I have shown above that the deconvolution method 
of Kebede (1994) appears to be erroneously derived. 
Kebede's final expression (eqn. |l^), however, does 
provide a regularized estimate of the inverse, where 
the regularization is caused by the exact cancellation 
of the singular values. It is left to the reader to decide 
if this is a positive characteristic of Kebede's pub- 
lished algorithm, though the above discussion and 
examples would seem to indicate that it does not 
perform particularly well, even when compared with 
similar simple approaches. I have showed how to ex- 
plicitly include statistical information, but also that 
much of this is lost due to the cancellation of the 
singular values. 

I close with one final comment on the efficiency 
of the method. K94 claims that ". . . it takes very 
little computing time to run a program written based 
on this iterative method regardless of the size of the 
problem." However, this is clearly not true. The 
computational complexity of SVD scales very badly 
with the problem size, going like the aMN 2 + bN 3 for 
an M x N matrix ( |Golub fc Van Loan 1989[ ). For 
square image and data with J x J pixels, this would 
be 0( J 6 ), which is terrible. Whether one uses eqn. [l^ 
or eqn. |l5|, the SVD must be calculated explicitly, 
which would impose a heavy computational burden 
for all but very small images. 
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